The OpenGrm open-source finite-state grammar software libraries

نویسندگان

  • Brian Roark
  • Richard Sproat
  • Cyril Allauzen
  • Michael Riley
  • Jeffrey S. Sorensen
  • Terry Tai
چکیده

In this paper, we present a new collection of open-source software libraries that provides command line binary utilities and library classes and functions for compiling regular expression and context-sensitive rewrite rules into finite-state transducers, and for n-gram language modeling. The OpenGrm libraries use the OpenFst library to provide an efficient encoding of grammars and general algorithms for building, modifying and applying models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Why Implementation Matters: Evaluation of an Open-source Constraint Grammar Parser

In recent years, the problem of finite-state constraint grammar (CG) parsing has received renewed attention. Several compilers have been proposed to convert CG rules to finite-state transducers. While these formalisms serve their purpose as proofs of the concept, the performance of the generated transducers lags behind other CG implementations and taggers. In this paper, we argue that the fault...

متن کامل

Developing an Open-Source FST Grammar for Verb Chain Transfer in a Spanish-Basque MT System

This paper presents the current status of development of a finite state transducer grammar for the verbal-chain transfer module in Matxin, a Rule Based Machine Translation system between Spanish and Basque. Due to the distance between Spanish and Basque, the verbal-chain transfer is a very complex module in the overall system. The grammar is compiled with foma, an open-source finitestate toolki...

متن کامل

OpenFst: An Open-Source, Weighted Finite-State Transducer Library and its Applications to Speech and Language

Finite-state methods are well established in language and speech processing. OpenFst (available from www.openfst.org) is a free and open-source software library for building and using finite automata, in particular, weighted finite-state transducers (FSTs). This tutorial is an introduction to weighted finitestate transducers and their uses in speech and language processing. While there are othe...

متن کامل

Porting Basque Morphological Grammars to foma, an Open-Source Tool

Basque is a morphologically rich language, of which several finite-state morphological descriptions have been constructed, primarily using the Xerox/PARC finite-state tools. In this paper we describe the process of porting a previous description of Basque morphology to foma, an open-source finite-state toolkit compatible with Xerox tools, provide a comparison of the two tools, and contrast the ...

متن کامل

Distributed representation and estimation of WFST-based n-gram models

We present methods for partitioning a weighted finite-state transducer (WFST) representation of an n-gram language model into multiple blocks or shards, each of which is a stand-alone WFST n-gram model in its own right, allowing processing with existing algorithms. After independent estimation, including normalization, smoothing and pruning on each shard, the shards can be reassembled into a si...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012